Quick glimpse


We envision to keep fostering on continuous integration and development of highly reproducible workflows.

iMAP 1: Getting started workflow

…In Progress…




iMAP 2: Microbiome Bioinformatics

Root directory

.
├── config
│   ├── pbs-torque
│   └── slurm
├── dags
├── data
│   └── mothur
│       ├── logs
│       ├── process
│       │   ├── error_analysis
│       │   └── intermediate
│       ├── raw
│       ├── reads
│       └── references
├── envs
├── images
├── library
└── workflow
    ├── envs
    ├── rules
    ├── schemas
    └── scripts

21 directories



Snakemake workflow

Note: Some snakemake rules available in existing workflows3 are integrated but with slight modification.



16S classification options

Classify OTUs

  • OTUs (Operational Taxonomic Units (OTUs)) are clusters of similar sequences and are commonly accepted as analytical units in microbial profiling when using 16S rRNA gene markers.

Classify Phylotypes

  • A phylotype in microbiome research is a DNA sequence or group of sequences sharing more than an arbitrarily chosen level of similarity of a 16S rRNA gene marker.

Classify ASVs

  • ASVs Amplicon Sequence Variants (ASVs)in microbiome research is any inferred single DNA sequences recovered from a bioinformatics analysis of 16S rRNA marker genes.
  • ASV is typically really a cluster of sequences that are one or two bases apart from each other.

Classify Phylogenies

  • Microbial phylogenies are from gene sequence homologies. Models of mutation determine the most-likely evolutionary histories.



Mothur Preliminary Analysis

The preliminary analysis (alpha_beta_diversity rule) is part of the bioinformatics analysis. It includes:

  • Creating reads count for each group.
  • Subsampling for downstream analysis.
  • Rarefaction.
  • Computing Alpha diversity metrics.
  • Computing Beta diversity metrics.
  • Getting sample distances.
  • Constructing sample phylip tree.
  • Generating ordination matrices including PCoA and NMDS.



iMAP 3: Data tidying & transformation




iMAP 4 : Data analysis & visualization




iMAP 5 : Microbiome Machine Learning




Appendix

Reference Databases

  1. Mothur-based SILVA reference files4
  2. .
  3. Mothur-based RDP reference files5. Note: The RDP database is to classify 16S rRNA gene sequences to the genus level.
  4. ZymoBIOMICS Microbial Community Standard (Cat # D6306)6. The ZymoBIOMICS Microbial Community DNA Standard is designed to assess bias, errors and other artifacts after the step of nucleic acid purification.



Mermaid workflow template

Troubleshooting (in progress)

  1. Are chimeras removed by default in newer versions?
    • Yes. Chimeras are removed by default. You can still run the remove.seqs command without error, but it is not necessary. Remove chimera sequence explained here
    .
  2. Mothur dist.seqs taking too long.
    • Merged reads are too long, probably over 300pb.
    • Reads not overlaping when merging the paired reads.
    • Too many uniques representative sequences probably caused by lack of overlapping.
    • No enough computer power which suggest a use of HPC or Cluster.



References

1. Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. (2021). Sustainable data analysis with snakemake. F1000Research, 10. https://doi.org/10.12688/f1000research.29032.2
2. Snakemake. (2023). Snakemake. https://snakemake.readthedocs.io/en/stable
3. Close, W. L. (2020). Mothur 16S v4 analysis pipeline. https://github.com/wclose/mothurPipeline
4. Mothur-based silva reference files. https://mothur.org/wiki/silva_reference_files/
5. Mothur-based RDP reference files. https://mothur.org/wiki/rdp_reference_files/
6. ZymoBIOMICS microbial community DNA standard (cat # D6306). https://www.zymoresearch.com/zymobiomics-community-standard